I highly recommend you…
Today we will…
Functions allow you to automate common tasks
We’ve been using functions since Day 1
Did you ever find yourself copy-pasting an analysis and changing small parts?
Writing your OWN functions has 3 big advantages over copy-and-paste:
Your code is easier to read
To change your analysis, simply change the function
No more mistakes from copy-paste
Let’s call the function!
add_two <-The name of the function is chosen by the author.
The argument(s) of the function are chosen by the author.
We can supply a default argument value – something =
something defaults to 2
{ body }The body of the function is where the action happens.
return()Your function will “give back” whatever would normally “print out”.
Recall DeMorgan’s law!
add_something <- function(x, something) {
if(!is.numeric(x) | !is.numeric(something)){
stop("Please provide a numeric inputs for both arguments.")
}
x + something
}
add_something(x = 2, something = "R")Error in add_something(x = 2, something = "R"): Please provide a numeric inputs for both arguments.
If an object doesn’t exist in the function’s environment, the global environment will be searched next; if there is no object in the global environment, the program will error out.
Objects you make in the function don’t affect “real life”.
This is an example of name masking, where names defined inside of a function mask names defined outside of a function.
Interactive coding (highlight small lines within your funciton to run them independent of the rest)
print() Debugging
Rubber Ducking
In general…
Write a simple example once (without a function)
Generalize by assigning variables.
Write into a function.
Call the function on desired arguments
find_car_make()Write a function called find_car_make() that takes as input the name of a car, and returns only the “make”, or the company that created the car.
Tip
For example, find_car_make(“Toyota Camry”) should return “Toyota” and find_car_make(“Ford Anglica”) should return “Ford”.
Consider mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Let’s use our function to create a new column in the data called make that gives the make of each car.
mtcars |>
rownames_to_column("make_model") |>
mutate(make = find_car_make(make_model),
.after = make_model
) |>
head(n = 3) make_model make mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 Datsun 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Challenge 7: Incorporating Multiple Inputs
Could our function be more efficient?
Note
Notice how I relied on the existing function std_to_01() inside the new function, for clarity!
Functions that use unquoted variable names as arguments are called nonstandard evaluation or tidy evaluation.
library(rlang)In February 2020 rlang introduced the “injection” {{ }} operator to simplify writing functions around tidyverse pipelines.
With the {{ }} operator you can inject the name of data-variables (i.e. columns from the data frames) into function arguments!
Warning
This only works for select() type functions, that use a literal (tidy) name of the variable to subset the data.
std_column_01 <- function(data, variable) {
data <- data |>
mutate(
variable = std_to_01(variable)
)
data
}
std_column_01(penguins, body_mass_g)Error in `mutate()`:
! Problem while computing `variable = std_to_01(variable)`.
Caused by error in `stopifnot()`:
! object 'body_mass_g' not found
mutate() defuses the R code it was supplied.body_mass_g = standardize(body_mass_g).This is why we need injection!
std_column_01 <- function(data, variable) {
stopifnot(is.data.frame(data))
data <- data
mutate({{ variable }} = std_to_01( {{ variable }})
)
data
}Error: <text>:6:27: unexpected '='
5: data <- data
6: mutate({{ variable }} =
^
Danger
Oh no! What happened?
The left hand side of = is also diffused!
:=The “walrus operator” := is an alias of =.
You can use it to supply names, e.g. a := b is equivalent to a = b.
across()What if I want to modify multiple columns?
Without inspection:
Observations are “missing completely at random”
With information about the “missingness”:
Observations are “missing at random”
Look for patterns!
If fish length measurements are missing at random, conditional on month, year, and river section,
then the distributions of lengths will be similar for fish of the same month, year, and river section.
Why Scale?
Easier to compare across variables
Easier to model (standardizes variance)
Why not Scale?
Article on How Building Functions with Variable Names has Changed Over the Years